Introduction

This is a multi-logistic regression model meant to help Math 325 students get better grades on the final. We will test several different variables to see which efforts are helpful to getting a good grade.

#Load in data
library(tidyverse)
library(pander)
library(ggplot2)

m325.test <- read.csv("C:/Users/Jared/OneDrive/Desktop/Math 425/Math 325 grade data/math325grades_test.csv")

m325.train <- read.csv("C:/Users/Jared/OneDrive/Desktop/Math 425/Math 325 grade data/math325grades_train.csv")


# Remove NA values
#m325.test <- na.omit(m325.test)
m325.train <- na.omit(m325.train)

#mutate to show if they got above an -A on the final


m325.train <- m325.train %>%
  mutate(f90 = ifelse(FinalGrade %in% c("A", "A-"), 1, 0))


m325.test <- m325.test %>%
  mutate(f90 = ifelse(FinalGrade %in% c("A", "A-"), 1, 0))

Model and Interpriation

# glm1 <- glm(f90 ~ PredictedFinalExam + AssessmentQuizCompletionTotal +
#               SkillsQuizzesTotalCurrentScore + PeerReviewCurrentScore,
#               family = binomial, data = m325.train)
# 
# summary(glm1)



glm2 <- glm(f90 ~ AnalysesFinalScore + FinalExamFinalScore, family = binomial, data = m325.train)
#summary(glm2)

summary(glm2) %>% pander(caption="Multiple Logistic Regression Summary")
  Estimate Std. Error z value Pr(>|z|)
(Intercept) -155.6 70.66 -2.202 0.02766
AnalysesFinalScore 1.394 0.6325 2.204 0.02753
FinalExamFinalScore 0.4967 0.2302 2.157 0.03098

(Dispersion parameter for binomial family taken to be 1 )

Null deviance: 95.23 on 70 degrees of freedom
Residual deviance: 12.18 on 68 degrees of freedom
#analysis and final are the most usful vars
#create a mutli one by mutate and getting switches like f90 only for other stuff?

This model uses the students final analysis score, and the students final exam score. The intercept is in affect 0 because \(e^{-155.59}\) is so small it’s almost not even worth noting. All it is saying is that if a student gets a 0 on the final and on all the analysis they will not get an A. A one-unit increase in the final analysis score increases the log-odds of scoring above 90 by 1.3939. A higher analysis score suggests the student is more likely to score a higher grade. A one-unit increase in the final exam score increases the log-odds of scoring above 90 by 0.4967. A higher score on the final suggests the student is more likely to score a higher grade.

The model seems to fit well with a relativity low AIC of 18, and a low residual deviance at 12.

Math 325 students. To improve the likelihood of you getting an A or A- resubmit your analysis and study for the final.

Graphs

#2D plots didn't exactly work...
# #final exam grades
# plot((f90==1) ~ FinalExamFinalScore, xlim=c(0,100), data=m325.train)
# curve(exp(-155.5969 + 0.4967*x)/(1+exp(-155.5969 + 0.4967*x)), add=TRUE)
# 
# 
# #analysis grades
# plot((f90==1) ~ FinalExamFinalScore, xlim=c(0,100), data=m325.train)
# curve(exp(-155.5969 + 1.3939*x)/(1+exp(-155.5969 + 1.3939*x)), add=TRUE)



# Plot with one predictor while using the other as a color-coded grouping
ggplot(m325.train, aes(x = AnalysesFinalScore, y = f90, color = factor(FinalExamFinalScore))) +
  geom_point(alpha = 0.6) +  # Scatter plot of actual data
  geom_smooth(method = "glm", method.args = list(family = "binomial"), se = FALSE) +
  scale_color_viridis_d() +  # Better color scale for visibility
  theme_bw() +
  labs(title = "Logistic Regression Model",
       x = "AnalysesFinalScore",
       y = "Probability of f90 = 1",
       color = "FinalExam  Analysis Scores")

This is a 2D plot of the regression using different colors to indicate how the model handles different final exam, and analysis scores. However, the model is much more clear in the 3d space.

library(plotly)
library(reshape2)

# Graph resolution (determines smoothness of surface)
graph_reso <- 1

# Define axis values for predictors
axis_x <- seq(min(m325.train$AnalysesFinalScore), max(m325.train$AnalysesFinalScore), by = graph_reso)
axis_y <- seq(min(m325.train$FinalExamFinalScore), max(m325.train$FinalExamFinalScore), by = graph_reso)

# Create a grid of values
log_surface <- expand.grid(AnalysesFinalScore = axis_x, FinalExamFinalScore = axis_y)

# Predict probabilities using the logistic regression model
log_surface$Z <- predict(glm2, newdata = log_surface, type = "response")

# Convert predictions to matrix format for the surface plot
log_surface_matrix <- acast(log_surface, FinalExamFinalScore ~ AnalysesFinalScore, value.var = "Z")

# Create 3D scatter plot with surface
plot_ly(m325.train, 
        x = ~AnalysesFinalScore, 
        y = ~FinalExamFinalScore, 
        z = ~f90, 
        type = "scatter3d", 
        mode = "markers", 
        marker = list(size = 4, color = "red", opacity = 0.7)) %>%
  add_trace(z = log_surface_matrix,
            x = axis_x,
            y = axis_y,
            type = "surface",
            colorscale = "Viridis",
            opacity = 0.7) %>%
  layout(title = "3D Logistic Model Showing Probability of an A",
         scene = list(xaxis = list(title = "Analyses Scores"),
                      yaxis = list(title = "Final Exam Scores"),
                      zaxis = list(title = "Probability of an A or A-")))

The 3D model shows how well the model fits the data. Just by looking at it we can see how steep the the logistic regression’s slope is. This suggests there is a steep cut off between A students and other students.

Validation

set.seed(100)
n <- nrow(m325.train)

keep <- sample(1:n, 0.65 * n) # putSomeNumberHere that is about 60-70% of your data set's size
mytrain <- m325.train[keep, ]
mytest <- m325.train[-keep, ]

train.glm <- glm(f90 ~ AnalysesFinalScore + FinalExamFinalScore, family = binomial, data = m325.train)

mypreds <- predict(train.glm, mytest, type="response")

callit <- ifelse(mypreds > 0.9, 1, 0) # you can put whatever you want for the 0.9 value

#table(mytest$f90, callit)

pcc <- sum(mytest$f90 == callit) / length(callit) # sum the correct answers you got then divide by the total number of guesses you made
print(pcc)
## [1] 0.92

Our model validated with an 92% percent. That seems fairly accurate and matches what we saw in the 3D graph.

Conclusion

Higher scores on the analysis assignments will help any Math 325 student increase the likelihood of them getting an \(A\). Doing these well will also help prepare you for the final. If you get 100% on the analysis assignments you can do very poorly on the final and still get an \(A\). That’s not to say you shouldn’t study for the final. Doing well on the final will also significantly increase the likelihood of an \(A\). Good luck, I hope you do well in the class.

#get csv needed

#shift control c to get the excel sheet of what's going on

# # Predict probabilities for the test dataset using the trained model
# test_preds <- predict(train.glm, m325.test, type = "response")
# 
# # Convert probabilities to categorical grades
# m325.test$FinalGrade <- ifelse(test_preds > 0.9, "A", "Other")
# 
# # Save the results to a new CSV file
# write.csv(m325.test, "math325grades_predictions.csv", row.names = FALSE)


#prediction v.s conf interval use midterm model selection how to read diagnostics SSE SSR SSTO
#Look at practice monday is this stuff